Knowledge Discovery in Biosequences Using Sort Regular Patterns
نویسندگان
چکیده
This paper considers knowledge discovery by sort regular patterns, which are strings over sort letters representing nite sets of basic letters. We devise a learning algorithm for the class based on the minimal multiple generalization technique, and evaluate the method by experiments on biosequences from GenBank database. The experiments show that relatively a simple sort pattern can represent a complex motif in biosequences, and the learning algorithm works well in noisy examples.
منابع مشابه
Reports in Informatics Approaches to the Automatic Discovery of Patterns in Biosequences
Approaches to the automatic discovery of patterns in biosequences. Abstract This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classiication of pattern languages in this class is developed, covering those patterns which are the most frequently...
متن کاملApproaches to the Automatic Discovery of Patterns in Biosequences
This paper surveys approaches to the discovery of patterns in biosequences and places these approaches within a formal framework that systematises the types of patterns and the discovery algorithms. Patterns with expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering the patterns that are the most frequen...
متن کاملReports in Informatics Relation Patterns and Their Automatic Discovery in Biosequences Relation Patterns and Their Automatic Discovery in Biosequences
We have extended the pattern language used in PROSITE to enable it to describe dependencies between amino acid residues. We have developed a minimum description length principle based tness measure evaluating the signiicance of such patterns in relation to a set of sequences, and an algorithm automatically nding signiicant patterns in unaligned sequences. Computing experiments are reported show...
متن کاملMeasuring Over-Generalization in the Minimal Multiple Generalizations of Biosequences
We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17]...
متن کاملPattern Discovery from Biosequences
In this thesis we have developed novel methods for analyzing biological data, the primary sequences of the DNA and proteins, the microarray based gene expression data, and other functional genomics data. The main contribution is the development of the pattern discovery algorithm SPEXS, accompanied by several practical applications for analyzing real biological problems. For performing these bio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007